Two robots of the same type#
The code for this example is implemented same_robots. Let us import it.
[3]:
from enki_env.examples import same_robots
Environment#
To create the environment via script, run:
python -m enki_env.examples.same_robot.environment
[4]:
env = same_robots.make_env(render_mode="human")
env.reset()
env.snapshot()
The robots belong to the same "thymio" group and share the same configuration.
[5]:
env.group_map
[5]:
{'thymio': ['thymio_0', 'thymio_1']}
Like in the single robot example, the robots use just their proximity sensors and receive a similar reward that makes them want to rotate until they face each other, when the episode terminates.
[6]:
env.action_spaces
[6]:
{'thymio_0': Box(-1.0, 1.0, (1,), float64),
'thymio_1': Box(-1.0, 1.0, (1,), float64)}
[7]:
env.observation_spaces
[7]:
{'thymio_0': Dict('prox/value': Box(0.0, 1.0, (7,), float64)),
'thymio_1': Dict('prox/value': Box(0.0, 1.0, (7,), float64))}
Baseline#
We have hand-coded a simple distributed policy to achieve the task.
To evaluate the baseline via script, run:
python -m enki_env.examples.same_robots.baseline
[8]:
import inspect
print(inspect.getsource(same_robots.Baseline.predict))
def predict(self,
observation: Observation,
state: State | None = None,
episode_start: EpisodeStart | None = None,
deterministic: bool = False) -> tuple[Action, State | None]:
prox = observation['prox/value']
if any(prox > 0):
prox = prox / np.max(prox)
ws = np.array((0.5, 0.25, 0, -0.25, -0.5, 1, 1))
w = np.dot(ws, prox)
else:
w = 1
return np.clip([w], -1, 1), None
To perform a rollout, we need to assign the policy to the whole group.
[10]:
rollout = env.unwrapped.rollout(max_steps=10, policies={'thymio': same_robots.Baseline()})
For multi-robot environments, the rollouts return a dictionary with data collected from each group,
[11]:
rollout.keys()
[11]:
dict_keys(['thymio'])
[12]:
rollout['thymio'].episode_reward
[12]:
np.float64(-23.985014107779367)
Reinforcement Learning#
Let us now train and evaluate a RL policy for the same task.
To perform this via script, run: ```console python -m enki_env.examples.same_robots.rl
[13]:
policy = same_robots.get_policy()
[14]:
rollout = env.unwrapped.rollout(max_steps=10, policies={'thymio': policy})
rollout['thymio'].episode_reward
[14]:
np.float64(-28.58025193565431)
Video#
To generate a similar video as in the single robot example, run
python -m enki_env.examples.same_robots.video
or run
[13]:
video = same_robots.make_video()
video.display_in_notebook(fps=30, width=640, rd_kwargs=dict(logger=None))
MoviePy - Building video __temp__.mp4.
MoviePy - Writing video __temp__.mp4
MoviePy - Done !
MoviePy - video ready __temp__.mp4
[13]: